Goto

Collaborating Authors

 reaction data


Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

arXiv.org Artificial Intelligence

Chemical reaction data is a pivotal asset, driving advances in competitive fields such as pharmaceuticals, materials science, and industrial chemistry. Its proprietary nature renders it sensitive, as it often includes confidential insights and competitive advantages organizations strive to protect. However, in contrast to this need for confidentiality, the current standard training paradigm for machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models. This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries and frequent data transmission between entities, potentially exposing proprietary information to unauthorized access or interception during storage and transfer. In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models. CKIF enables distributed training across multiple chemical organizations without compromising the confidentiality of proprietary reaction data. Instead of gathering raw reaction data, CKIF learns retrosynthesis models through iterative, chemical knowledge-informed aggregation of model parameters. In particular, the chemical properties of predicted reactants are leveraged to quantitatively assess the observable behaviors of individual models, which in turn determines the adaptive weights used for model aggregation. On a variety of reaction datasets, CKIF outperforms several strong baselines by a clear margin (e.g., ~20% performance improvement over FedAvg on USPTO-50K), showing its feasibility and superiority to stimulate further research on privacy-preserving retrosynthesis.


ReactionT5: a large-scale pre-trained model towards application of limited reaction data

arXiv.org Artificial Intelligence

Transformer-based deep neural networks have revolutionized the field of molecular-related prediction tasks by treating molecules as symbolic sequences. These models have been successfully applied in various organic chemical applications by pretraining them with extensive compound libraries and subsequently fine-tuning them with smaller in-house datasets for specific tasks. However, many conventional methods primarily focus on single molecules, with limited exploration of pretraining for reactions involving multiple molecules. In this paper, we propose ReactionT5, a novel model that leverages pretraining on the Open Reaction Database (ORD), a publicly available large-scale resource. We further fine-tune this model for yield prediction and product prediction tasks, demonstrating its impressive performance even with limited fine-tuning data compared to traditional models. The pre-trained ReactionT5 model is publicly accessible on the Hugging Face platform.


Yield-predicting AI needs chemists to stop ignoring failed experiments

#artificialintelligence

Machine-learning algorithms that can predict reaction yields have remained elusive because chemists tend to bury low-yielding reactions in their lab notebooks instead of publishing them, researchers say. 'We have this image that failed experiments are bad experiments,' says Felix Strieth-Kalthoff. 'But they contain knowledge, they contain valuable information both for humans and for an AI.' Strieth-Kalthoff from the University of Toronto, Canada, and a team around Frank Glorius from Germany's University of Mรผnster are asking chemists to start including not only their best but also their worst results in their papers. This, as well as unbiased reagent selection and reporting experimental procedures in a standardised format, will allow researchers to finally create yield-prediction algorithms. Retrosynthesis is already using machine-learning models to create shorter, cheaper or non-proprietary synthetic routes. But there have been few attempts at creating programs that predict yields.


How Does AI Fit Into Health Care's Priorities Of 2018?

#artificialintelligence

Interest in artificial intelligence (AI) is exploding, with Accenture forecasting that AI in health care will grow to $6.6 billion in a few short years, at a 40% annual compounded growth rate. Accenture also believes this technology will enable an opportunity for $150 billion in industry savings. So, is this hype justified? The short answer is yes, but it belies a much deeper question: How do we weed out the hype and determine exactly what is the most effective role for AI so that we make the rest of 2018 a year for positive change and not disruptive chaos? AI can augment a physician's thought process and how he or she reasons out a problem.


5 key takeaways from a new report on AI, machine learning in radiology

#artificialintelligence

Research firm Reaction Data has published a new report, "Machine Learning in Medical Imaging," that breaks down what radiologists and other imaging professionals think about AI, machine learning and the future of radiology. "The primary motivation behind this study is the sheer amount of hype going on in healthcare, specifically in radiology and imaging, around AI--deep learning and machine learning," the report's authors wrote. "In essence, the machine learning buzz is, quite literally, through the roof." Reaction Data received input from more than 130 industry professionals. While 45 percent of respondents were directors of radiology, another 20 percent were radiologists and 9 percent were imaging directors.